Session 9: Interactive Visualization with Altair¶

Background on Altair¶

  • Visualization library for Python
  • Called a declarative approach to data visualization - which basically means that it's trying to make visualization more concise by having you as the user specify relationships between the data and the output (e.g., "map x to a position and y to a color") rather than specifying how something should be done ("put a red circle here and a blue circle there")*

*Source: https://altair-viz.github.io/altair-tutorial/README.html

Basic building block: the mark¶

  • Basic building block is a mark, a type of marker where you can then specify various configurations of x, y, color, and interactivity
  • Types of marks:

    • mark_point()
    • mark_circle()
    • mark_square()
    • mark_line()
    • mark_area()
    • mark_bar()
    • mark_tick()
  • Illustrating example with WHO data on country-year-level life expectancy

Reading in data¶

In [43]:
import pandas as pd
import numpy as np
import altair as alt
from altair import datum

who = pd.read_csv('Life Expectancy Data.csv')
who.head()
who.columns = [col.strip().lower() for col in 
          who.columns]
who.columns
who.year.value_counts()
Out[43]:
2013    193
2015    183
2014    183
2012    183
2011    183
2010    183
2009    183
2008    183
2007    183
2006    183
2005    183
2004    183
2003    183
2002    183
2001    183
2000    183
Name: year, dtype: int64

Basic mark_point() with no encoding¶

In [ ]:
alt.Chart(who).mark_point()

Adding encoding to the mark_point()¶

In [11]:
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = 'schooling',
  y = 'life expectancy'
)
Out[11]:

Cleaning up¶

In [16]:
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = 'schooling',
  y = 'life expectancy',
  color = 'status'
).configure_axis(
    grid=False
)
Out[16]:

Further customizing¶

  • To further customize, switch syntax within encode() from:
    • x = variable name; y = variable name; color = variable name; etc
  • To:
    • x = alt.X(), y = alt.Y(), color = alt.Color() with the parentheses then holding further customizations

Further customizing: colors¶

In [18]:
domain = ['Developing', 'Developed']
colors = ['seagreen', '#7D3C98']

alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = 'schooling',
  y = 'life expectancy',
  color = alt.Color('status').scale(domain = domain, range = colors)
).configure_axis(
    grid=False
)
Out[18]:

Further customizing: X and Y labels¶

In [19]:
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.Color('status').scale(domain = domain, range = colors)
).configure_axis(
    grid=False
)
Out[19]:

Bar charts: mark_bar()¶

  • Similar to stat = 'identity' in ggplot, can create bar charts using a specific column of the dataset
  • Can also do transformations to create the values displayed in bars within the code itself. See here for a list: https://altair-viz.github.io/user_guide/transform/index.html
  • Our example will also show the value of using explicit variable type encodings rather than relying on altair's detection of type of data:
    • Q: quantitative
    • O: ordinal
    • N: nominal
    • T: temporal
    • G: geojson

Example of an identity bar chart: don't declare types¶

In [33]:
who_subset = who[(who.country.isin(['United States of America', 'Canada', 'Mexico'])) &
                (who.year > 2004)].copy()
alt.Chart(who_subset).mark_bar().encode(
    x = alt.X('year', title = "Year"),
    y = alt.Y('life expectancy', title = "Life expectancy"),
    xOffset="country:N",
    color = alt.Color('country:N', title = "")
)
Out[33]:

Example of an identity bar chart: declare types¶

In [34]:
alt.Chart(who_subset).mark_bar().encode(
    x = alt.X('year:O', title = "Year"),
    y = alt.Y('life expectancy:Q', title = "Life expectancy"),
    xOffset="country:N",
    color = alt.Color('country:N', title = "")
)
Out[34]:

Example of a transformation-based bar chart: mean by group¶

In [40]:
alt.Chart(who[who.year > 2009]).mark_bar().encode(
    x = alt.X('year:O', title = "Year"),
    xOffset = "status:N",
    y = alt.Y('avg_life:Q', title = "Average Life expectancy"),
    color = alt.Color('status:N', title = "")
).transform_aggregate(
    avg_life = 'mean(life expectancy)',
    groupby = ['status', 'year']
)
Out[40]:

Example of a transformation-based bar chart: filter within chart itself¶

In [51]:
## notice layering of filters 
alt.Chart(who).mark_bar().encode(
    x = alt.X('year:O', title = "Year"),
    y = alt.Y('life expectancy:Q', title = "Life expectancy"),
    xOffset="country:N",
    color = alt.Color('country:N', title = "")
).transform_filter(
    alt.FieldOneOfPredicate(field = 'country',
                            oneOf = ["Canada", "Mexico",
                                              "United States of America",
                                              "Cuba"])
).transform_filter(
    alt.FieldGTPredicate(field = 'year', gt = 2009)
)
Out[51]:

Different types of interactivity¶

  • Tooltips: hovering over points to bring up information
  • Selections:
      - Allow users to select an interval range of the chart
      - Allow users to select a single point
      - Allow users to select multiple points

Illustrating tooltips¶

In [56]:
c = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.Color('status').scale(domain = domain, range = colors),
  tooltip = [alt.Tooltip('country', title = "Country:"),
            alt.Tooltip('life expectancy', title = "Life exp:"),
            alt.Tooltip('schooling', title = 'Years schooling:')]
).configure_axis(
    grid=False
).interactive()
Out[56]:

Illustrating selections¶

  • Can use the add_selection() set of commands to select a certain region of points
  • Can add code to the main chart to make the chart respond to the selection
In [57]:
brush = alt.selection_interval()
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.Color('status').scale(domain = domain, range = colors)
).configure_axis(
    grid=False
).add_selection(
brush
)
/var/folders/mb/h311n7mj5dl4l2h43n8bnzs00000gp/T/ipykernel_43979/2472105602.py:2: AltairDeprecationWarning: Deprecated in `altair=5.0.0`. Use add_params instead.
  alt.Chart(who[who.year == 2013]).mark_point().encode(
Out[57]:
In [74]:
brush = alt.selection_interval()
alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.condition(brush, 'status:N', alt.value('lightgray'))
).configure_axis(
    grid=False
).add_selection(
brush
)
/var/folders/mb/h311n7mj5dl4l2h43n8bnzs00000gp/T/ipykernel_43979/3208535782.py:2: AltairDeprecationWarning: Deprecated in `altair=5.0.0`. Use add_params instead.
  alt.Chart(who[who.year == 2013]).mark_point().encode(
Out[74]:

Interactivity across multiple charts¶

  • Altair also gives us the ability to have selections on one chart propagate through to other charts
  • It does this using the transform_filter() that we outlined earlier
  • Can use this to explore correlations across multiple variables

Step one: create multiple charts¶

In [91]:
scatter_schooling = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
color = alt.Color('status').scale(domain = domain, range = colors))

scatter_gdp = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('gdp', title = "GDP"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
color = alt.Color('status').scale(domain = domain, range = colors))

scatter_schooling | scatter_gdp
Out[91]:

Step two: add selection on one chart and filtering on another¶

In [94]:
brush = alt.selection_interval()
scatter_schooling = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.condition(brush, 'status:N', alt.value('lightgray'))
).add_selection(
brush
)
scatter_gdp = alt.Chart(who[who.year == 2013]).mark_point().encode(
  x = alt.X('gdp', title = "GDP"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
color = 'status:N').transform_filter(
brush
)
scatter_schooling | scatter_gdp
/var/folders/mb/h311n7mj5dl4l2h43n8bnzs00000gp/T/ipykernel_43979/1044852389.py:2: AltairDeprecationWarning: Deprecated in `altair=5.0.0`. Use add_params instead.
  scatter_schooling = alt.Chart(who[who.year == 2013]).mark_point().encode(
Out[94]:

Another example: filtering by year¶

In [97]:
select_year = alt.selection_interval(encodings = ['x'])

bar_slider = alt.Chart(who).mark_bar().encode(
    x = 'year:O',
    y = 'count()'
).add_selection(select_year)

scatter_schooling = alt.Chart(who).mark_point().encode(
  x = alt.X('schooling', title = "Average years of schooling"),
  y = alt.Y('life expectancy', title = "Life expectancy (2013)"),
  color = alt.condition(select_year, 'status:N', alt.value('lightgray')),
  opacity = alt.condition(select_year, alt.value(0.8), alt.value(0.1))
)

scatter_schooling & bar_slider
/var/folders/mb/h311n7mj5dl4l2h43n8bnzs00000gp/T/ipykernel_43979/1050406603.py:3: AltairDeprecationWarning: Deprecated in `altair=5.0.0`. Use add_params instead.
  bar_slider = alt.Chart(who).mark_bar().encode(
Out[97]:

Summing up¶

  • Reviewed general syntax for visualizations in altair:
    • mark as the basic building block
    • encode to specify mappings to x, y, and color
  • Use of transform_aggregate and transform_filter to transform data within the plotting call itself
  • Two types of interactivity:
    • Tooltips
    • Selections -> can propagate through to multiple charts